Advanced Geospatial Data Processing for Social Scientists
Dennis Abel & Stefan Jünger
2025-04-29
Now
Day
Time
Title
April 28
10:00-11:15
Introduction
April 28
11:15-11:30
Coffee Break
April 28
11:30-13:00
Raster data in R
April 28
13:00-14:00
Lunch Break
April 28
14:00-15:15
Raster data processing
April 28
15:15-15:30
Coffee Break
April 28
15:30-17:00
Graphical display of raster data in maps
April 29
10:00-11:15
Datacube processing I
April 29
11:15-11:30
Coffee Break
April 29
11:30-13:00
Datacube processing II & API access
April 29
13:00-14:00
Lunch Break
April 29
14:00-15:15
Data integration and linking (with survey data)
April 29
15:15-15:30
Coffee Break
April 29
15:30-17:00
Outlook and open session with own application
Datacubes
Yesterday, we worked with single layers of raster data. Today, we turn towards the more complex raster stacks or datacubes. Essentially we add one or more dimension(s) to our data object. This increases complexity - for humans and machines.
Our goal for today is to:
Understand the logic of raster stacks and datacubes
Be able to stack multiple raster layers into a coherent raster stack or datacube
Be able to navigate and process the spatial and temporal dimensions
Familiarize ourselves with the stars and terra packages for raster processing
If you want to do anything in R, you need to use functions, and functions are provided through packages.
terra (and its predecessor raster) by Robert Hijmans is probably the most commonly used package for raster data in R (and equally relevant for vector data).
Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data.
The authors have produced a very extensive tutorial which is basically a textbook for spatial data science in R.
Most important EOD packages 📦
If you want to do anything in R, you need to use functions, and functions are provided through packages.
stars by Edzer Pebesma is equally useful when handling spatiotemporal arrays (datacubes).
This R package provides classes and methods for reading, manipulating, plotting and writing such data cubes, to the extent that there are proper formats for doing so.
The stars syntax follows the logic of the sf-package (also by Edzer Pebesma) and together they can be seen as the emergence of a “spatial tidyverse”.
Edzer Pebesma’s and Roger Bivand’s Spatial Data Science textbook is THE go-to resource for learning the sf and stars syntax.
Similarities and differences between terra and stars
With terra, we create three-dimensional stacks of raster layers, were the third dimension is often (not always) time - in contrast, stars objects can be multi-dimensional and can store more dimensions (e.g. several variables)
Major difference: terra doesn’t treat the third dimension (time or band) as a “dimension” in the same way stars does.
Time is encoded via layer names or separate metadata in terra objects — you manage it manually.
Informally we will call SpatRaster objects created with terra “raster stacks” and stars objects “raster cubes”.
Stacking raster layers
We will start by manually creating our raster stacks/cubes by combining several layers. We will first show our terra-approach and afterwards replicate the process with stars. These steps include:
Combination of separate layers into one object
Adjustments to metadata and global attributes like:
Time dimension
CRS
The output will be a clean raster stack/cube for further processing/analysis.
terra approach: stacking raster layers
We load four layers of population data for California for the years 2017-2020.
When working with a yearly resolution, it is often consensus to set the date to the first of January. If you want to avoid confusion and explicitly assign only the year, you can do that by adjusting the tstep= argument in the terra::time-function.
When working with a yearly resolution, it is often consensus to set the date to the first of January. If you want to avoid confusion and explicitly assign only the year, you can do that by adjusting the tstep= argument in the terra::time-function.
# Have a first glance at the dataterra::plot(CA_pop_stack)
terra approach: stacking raster layers
By default, the variable name is derived from the file-name. We can replace it with a simpler version.
There are also options to adjust short and long variable names.
# Optional - setting variable namesvarnames(CA_pop_stack) <-"population"longnames(CA_pop_stack) <-"California population estimate (1 km grid)"print(CA_pop_stack)
class : SpatRaster
dimensions : 1136, 1234, 4 (nrow, ncol, nlyr)
resolution : 0.008333333, 0.008333333 (x, y)
extent : -124.4179, -114.1346, 32.54125, 42.00792 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
sources : US-CA_ppp_2017_1km.tif
US-CA_ppp_2018_1km.tif
US-CA_ppp_2019_1km.tif
US-CA_ppp_2020_1km.tif
varnames : population (California population estimate (1 km grid))
population (California population estimate (1 km grid))
population (California population estimate (1 km grid))
...
names : pop_2017, pop_2018, pop_2019, pop_2020
time (years): 2017 to 2020
terra approach: stacking raster layers
Although in practice, we often don’t do it, there is also the option to explicitly set the value unit.
# Setting unitsunits(CA_pop_stack)
[1] "" "" "" ""
units(CA_pop_stack) <-"count"units(CA_pop_stack)
[1] "count" "count" "count" "count"
terra approach: stacking raster layers
For a quick glance at the distribution of the data, it is helpful to add minimum and maximum values to the metadata. By default not displayed because first reading raster from disk just creates a pointer object which does not read all cell values.
class : SpatRaster
dimensions : 1136, 1234, 4 (nrow, ncol, nlyr)
resolution : 0.008333333, 0.008333333 (x, y)
extent : -124.4179, -114.1346, 32.54125, 42.00792 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
sources : US-CA_ppp_2017_1km.tif
US-CA_ppp_2018_1km.tif
US-CA_ppp_2019_1km.tif
US-CA_ppp_2020_1km.tif
varnames : population (California population estimate (1 km grid))
population (California population estimate (1 km grid))
population (California population estimate (1 km grid))
...
names : pop_2017, pop_2018, pop_2019, pop_2020
min values : 0.00, 0, 0.00, 0.00
max values : 29224.54, 29316, 29433.52, 29535.46
unit : count, count, count, count
time (years): 2017 to 2020
terra approach: stacking raster layers
With the crs-function, we can directly access the CRS of the object. You can assign a label to the crs of the stack (for example in case it does not have one). IMPORTANT: Keep in mind that this DOES NOT reproject the data.
crs(CA_pop_stack)
[1] "GEOGCRS[\"WGS 84\",\n ENSEMBLE[\"World Geodetic System 1984 ensemble\",\n MEMBER[\"World Geodetic System 1984 (Transit)\"],\n MEMBER[\"World Geodetic System 1984 (G730)\"],\n MEMBER[\"World Geodetic System 1984 (G873)\"],\n MEMBER[\"World Geodetic System 1984 (G1150)\"],\n MEMBER[\"World Geodetic System 1984 (G1674)\"],\n MEMBER[\"World Geodetic System 1984 (G1762)\"],\n MEMBER[\"World Geodetic System 1984 (G2139)\"],\n MEMBER[\"World Geodetic System 1984 (G2296)\"],\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]],\n ENSEMBLEACCURACY[2.0]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n USAGE[\n SCOPE[\"Horizontal component of 3D system.\"],\n AREA[\"World.\"],\n BBOX[-90,-180,90,180]],\n ID[\"EPSG\",4326]]"
# crs(CA_pop_stack) <- "EPSG:4326"
terra approach: stacking raster layers
If you want to reproject the data, you can utilize project as before. Here, we transform to NAD83 / California Albers (EPSG:3310).
CA_pop_stack <- terra::project(CA_pop_stack, y ="EPSG:3310",method ="bilinear" )print(CA_pop_stack)
class : SpatRaster
dimensions : 1287, 1168, 4 (nrow, ncol, nlyr)
resolution : 828.2603, 828.2603 (x, y)
extent : -415531, 551877.1, -607609.6, 458361.4 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / California Albers (EPSG:3310)
source(s) : memory
names : pop_2017, pop_2018, pop_2019, pop_2020
min values : 0.0, 0.00, 0.00, 0.00
max values : 22052.6, 22123.96, 22208.66, 22285.26
unit : count, count, count, count
time (years): 2017 to 2020
terra approach: stacking raster layers
If you want to reproject the data, you can utilize project as before. Here, we transform to NAD83 / California Albers (EPSG:3310).
# Have a first glance at the dataterra::plot(CA_pop_stack)
Note on reprojecting raster data
Transforming (projecting) raster data is fundamentally different from transforming vector data. Vector data can be transformed and back-transformed without loss in precision and without changes in the values. This is not the case with raster data. In each transformation the values for the new cells are estimated in some fashion. Therefore, if you need to match raster and vector data for analysis, you should generally transform the vector data.
terra approach: stacking raster layers
Now that we have properly prepared variable names and times, we have several options to access single layers.
# Indexing by layer index numberCA_pop_stack[[1]]
class : SpatRaster
dimensions : 1287, 1168, 1 (nrow, ncol, nlyr)
resolution : 828.2603, 828.2603 (x, y)
extent : -415531, 551877.1, -607609.6, 458361.4 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / California Albers (EPSG:3310)
source(s) : memory
name : pop_2017
min value : 0.0
max value : 22052.6
unit : count
time (years): 2017
# Indexing by layer nameCA_pop_stack["pop_2018"]
class : SpatRaster
dimensions : 1287, 1168, 1 (nrow, ncol, nlyr)
resolution : 828.2603, 828.2603 (x, y)
extent : -415531, 551877.1, -607609.6, 458361.4 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / California Albers (EPSG:3310)
source(s) : memory
name : pop_2018
min value : 0.00
max value : 22123.96
unit : count
time (years): 2018
# Indexing by time pointCA_pop_stack["2019"]
class : SpatRaster
dimensions : 1287, 1168, 1 (nrow, ncol, nlyr)
resolution : 828.2603, 828.2603 (x, y)
extent : -415531, 551877.1, -607609.6, 458361.4 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / California Albers (EPSG:3310)
source(s) : memory
name : pop_2019
min value : 0.00
max value : 22208.66
unit : count
time (years): 2019
terra approach: stacking raster layers
Now that we have properly prepared variable names and times, we have several options to access single layers.
# Plot a single layerterra::plot(CA_pop_stack[[1]])
terra approach: stacking raster layers
Exporting your object is the same for raster stacks.
stars object with 2 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
US-CA_ppp_2017_1km.tif 0 0 0 1.884433 0.09498084 1906.748 60820
dimension(s):
from to offset delta refsys point x/y
x 1 1234 -124.4 0.008333 WGS 84 FALSE [x]
y 1 1136 42.01 -0.008333 WGS 84 FALSE [y]
stars object with 2 dimensions and 4 attributes
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
17 0 0 0 1.884433 0.09498084 1906.748 60820
18 0 0 0 1.889280 0.09474182 1919.257 60820
19 0 0 0 1.894074 0.09462410 1933.029 60820
20 0 0 0 1.898908 0.09593478 1953.220 60820
dimension(s):
from to offset delta refsys point x/y
x 1 1234 -124.4 0.008333 WGS 84 FALSE [x]
y 1 1136 42.01 -0.008333 WGS 84 FALSE [y]
stars approach: stacking raster layers
We want to integrate the values into one attribute since it is the same variable.
In order to do that, we need to create the time-dimension first.
dates <-as.Date(paste0(2017:2020, "-01-01"))
stars approach: stacking raster layers
We can apply the merge-function on our existing datacube to integrate the four attributes into one by supplying the dates:
cube_merged <-merge(CA_pop_cube, f = dates, name ="time")print(cube_merged)
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
17.18.19.20 0 0 0 1.884433 0.09498084 1906.748 60820
dimension(s):
from to offset delta refsys point x/y
x 1 1234 -124.4 0.008333 WGS 84 FALSE [x]
y 1 1136 42.01 -0.008333 WGS 84 FALSE [y]
f 1 4 2017-01-01 365 days Date NA
stars approach: stacking raster layers
A more straightforward approach is to read the layers directly into one object based on the supplied date-variable:
files <-list.files("./data",pattern ="US-CA_ppp_20(17|18|19|20)_1km\\.tif$",full.names =TRUE)# Read layers along time dimensionCA_pop_cube <-read_stars(files, along =list(time = dates))print(CA_pop_cube)
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
US-CA_ppp_2017_1km.tif 0 0 0 1.884433 0.09498084 1906.748 60820
dimension(s):
from to offset delta refsys point x/y
x 1 1234 -124.4 0.008333 WGS 84 FALSE [x]
y 1 1136 42.01 -0.008333 WGS 84 FALSE [y]
time 1 4 2017-01-01 365 days Date NA
stars approach: stacking raster layers
The st_dimensions()and st_set_dimensions() functions are super helpful to access and manipulate dimension information. For example, if we want to transform the date into POSIX and assign a timezone, we can do it like this:
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
population 0 0 0 1.884433 0.09498084 1906.748 60820
dimension(s):
from to offset delta refsys point x/y
x 1 1234 -124.4 0.008333 WGS 84 FALSE [x]
y 1 1136 42.01 -0.008333 WGS 84 FALSE [y]
time 1 4 2017 1 NA NA
stars approach: stacking raster layers
Setting units in stars is not that straightforward and requires a “manual” and quite complicated workaround. Normally, we wouldn’t recommend that.
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
population [count] 0 0 0 1.884433 0.09498084 1906.748 60820
dimension(s):
from to offset delta refsys point x/y
x 1 1234 -124.4 0.008333 WGS 84 FALSE [x]
y 1 1136 42.01 -0.008333 WGS 84 FALSE [y]
time 1 4 2017 1 NA NA
stars approach: stacking raster layers
Access to the CRS is easy with st_crs. The EPSG code is not supplied as string (“EPSG:XXXX”) but as numeric values.
# CRSst_crs(CA_pop_cube)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
MEMBER["World Geodetic System 1984 (G2296)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
# to label is missing - not to reproject# st_crs(CA_pop_cube) <- 4326
stars approach: stacking raster layers
We can reproject the data with st_warp(). Default of use_gdal argument is TRUE. Unfortunately, this will remove our time-dimension metadata assigned earlier. Set it to FALSE to preserve that. This has the downside that the native stars warp is limited to nearest-neighbor resampling and slower. Alternative is to reassign time-metadata afterwards.
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
population [count] 0 0 0 1.894517 0.1030732 1906.748 71328
dimension(s):
from to offset delta refsys x/y
x 1 1171 -415531 826.2 NAD83 / California Albers [x]
y 1 1279 458361 -826.2 NAD83 / California Albers [y]
time 1 4 2017 1 NA
stars approach: stacking raster layers
We index time layers by index number.
CA_pop_cube[,,,1]
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
population [count] 0 0 0 1.894517 0.1030732 1906.748 71328
dimension(s):
from to offset delta refsys x/y
x 1 1171 -415531 826.2 NAD83 / California Albers [x]
y 1 1279 458361 -826.2 NAD83 / California Albers [y]
time 1 1 2017 1 NA
stars approach: stacking raster layers
There is also a native export function for stars objects.
NOCHMAL CHECKEN - HIER IST WAS FALSCH In theory, converting between terra and stars objects is simple. This will preserve most crucial information on cell values, spatial extent, and CRS. It can, however, mess around with your neatly prepared metadata.
# Tranform terra SpatRaster into stars object# CA_pop_stack_stars <- stars::st_as_stars(CA_pop_stack)# # class(CA_pop_stack_stars)# # print(CA_pop_stack_stars)
Converting between terra and stars
In theory, converting between terra and stars objects is simple. This will preserve most crucial information on cell values, spatial extent, and CRS. It can, however, mess around with your neatly prepared metadata.
# Tranform stars object into terra SpatRaster# CA_pop_cube_terra <- terra::rast(CA_pop_cube)# # class(CA_pop_cube_terra)# # print(CA_pop_cube_terra)